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Abstract. Let PT-DFA mean a deterministic finite automaton whose transition relation 
is a partial function. We present an algorithm for minimizing a PT-DFA in O(mlgn) time 
and 0(m + n + a) memory, where n is the number of states, m is the number of defined 
transitions, and a is the size of the alphabet. Time consumption does not depend on a, 
because the a term arises from an array that is accessed at random and never initialized. 
It is not needed, if transitions are in a suitable order in the input. The algorithm uses 
two instances of an array-based data structure for maintaining a refinable partition. Its 
operations are all amortized constant time. One instance represents the classical blocks and 
the other a partition of transitions. Our measurements demonstrate the speed advantage 
of our algorithm on PT-DFAs over an O(cmlgn) time, 0(an) memory algorithm. 



1. Introduction 

Minimization of a deterministic finite automaton (DFA) is a classic problem in computer 
science. Let n be the number of states, m the number of transitions and a the size of the 
alphabet of the DFA. Hopcroft made a breakthrough in 1970 by presenting an algorithm 
that runs in 0{n\gn) time, treating a as a constant |5j. Gries made the dependence of 
the running time of the algorithm on a explicit, obtaining 0(an\gn) [3J. (Complexity is 
reported using the RAM machine model under the uniform cost criterion [TJ p. 12].) 

Our starting point was the paper by Knuutila in 2001, where he presented yet another 
0{an\gn) algorithm, and remarked that some versions which have been believed to run 
within this time bound actually fail to do so [6]. Hopcroft 's algorithm is based on using only 
the "smaller" half of some set (known as block) that has been split. Knuutila demonstrated 
with an example that although the most well-known notion of "smaller" automatically leads 
to O(anlgn), two other notions that have been used may yield f2(n 3 ) when a = \n. He 
also showed that this can be avoided by maintaining, for each symbol, the set of those 
states in the block that have input transitions labelled by that symbol. According to [3], 
Hopcroft's original algorithm did so. Some later authors have dropped this complication as 
unnecessary, although it is necessary when the alternative notions of "smaller" are used. 

Key words and phrases: deterministic finite automaton, sparse adjacency matrix, partition refinement. 
Petri Lehtinen was funded by Academy of Finland, project ALEA (210795). 



ANTTI VALMARI 1 AND PETRI LEHTINEN 1 




(E) A. Valmari and R Lehtinen 

© Creative Commons Attribution-NoDerivs License 



646 



A. VALMARI AND P. LEHTINEN 



Knuutila mentioned as future work whether his approach can be used to develop an 
0{m\gn) algorithm for DFAs whose transition functions are not necessarily total. For 
brevity, we call them PT-DFAs. With an ordinary DFA, 0{m lg n) is the same as 0{an lg n) 
as m = an, but with a PT-DFA it may be much better. We present such an algorithm 
in this paper. We refined Knuutila's method of maintaining sets of states with relevant 
input transitions into a full-fledged data structure for maintaining refinable partitions. In- 
stead of maintaining those sets of states, our algorithm maintains the corresponding sets of 
transitions. Another instance of the structure maintains the blocks. 

Knuutila seems to claim that such a PT-DFA algorithm arises from the results in [7], 
where an O(mlgn) algorithm was described for refining a partition against a relation. 
However, there a = 1, so the solved problem is not an immediate generalisation of ours. 
Extending the algorithm to a > 1 is not trivial, as can be appreciated from the extension 
in [2]. It discusses 0{m\gn) without openly promising it. Indeed, its analysis treats a as a 
constant. It seems to us that its running time does have an an term. 

In Section [2] we present an abstract minimization algorithm that, unlike [3 [6], has been 
adapted to PT-DFAs and avoids scanning the blocks and the alphabet in nested loops. The 
latter is crucial for converting an into m in the complexity. The question of what blocks 
are needed in further splitting, has led to lengthy and sometimes unconvincing discussions 
in earlier literature. Our correctness proof deals with this issue using the "loop invariant" 
paradigm advocated in [3]. Our loop invariant "knows" what blocks are needed. 

Section [3] presents an implementation of the refinable partition data structure. Its per- 
formance relies on a carefully chosen combination of simple low- level programming details. 

The implementation of the main part of the abstract algorithm is the topic of Section [4j 
The analysis of its time consumption is based on proving of two lines of the code that, 
whenever the line is executed again for the same transition, the end state of the transition 
resides in a block whose size is at most half the size in the previous time. The numbers of 
times the remaining lines are executed are then related to these lines. 

With a time bound as tight as ours, the order in which the transitions are presented in 
the input becomes significant, since the O(mlgm) time that typical good sorting algorithms 
tend to take does not necessarily fit O(mlgn). We discuss this problem in Section [5l and 
present a solution that runs in 0{m) time but may use more memory, namely 0(m + a). 

Some measurements made with our implementations of Knuutila's and our algorithm 
are shown in Section [6l 

2. Abstract Algorithm 

A PT-DFA is a 5-tuple V = (Q,E,6,q,F) such that Q and E are finite sets, q G Q, 
F C Q and 5 is explained below. The elements of Q are called states, q is the initial state, 
and F is the set of final states. The set E is the alphabet. We have 5 C Q x E x Q, and 5 
satisfies the condition that if (q, a, qi) G 5 and (q, a, q-i) G 5, then qi = q%. The elements of 5 
are transitions. In essence, 5 is a partial function from Q x E to Q. Therefore, if (q, a, q') G 5, 
we write 5(q,a) = q' . If q G Q and a G E but there is no q' such that (q,a,q ! ) G 5, we 
write S(q, a) = _L, where _L is some symbol satisfying _L ^ Q. We will use \5\ as the number 
of transitions, and this number may be much smaller than |Q||E|, which is the number of 
transitions if 8 is a full function. 

By q — a±a2 • • • a n -^ q' we denote that there is a path from state q to state q' such 
that the labels along the path constitute the word a\a2 ■ ■ ■ a n . That is, q — e— > q holds for 
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1 (Q, E, 5, q, F) := remove irrelevant states and transitions from (Q, S, 5, q, F) 

2 if F = then return empty _DFA(£) 

3 else 



4 if Q = F then B := {F} else B := {F, Q - F} 

5 U := {(B,a) \ B G B A a G S A5 B , a + 0} 

6 while U ± do 

7 (£, a) := any_element_of(W); U := U - {{B, a)} 

8 for C6B such that 3g G C : <%, a) £ B do 

9 Ca := {gGC | %,a) G 5}; C 2 := C-C x 

10 if C 2 / then 

11 B:=B-{C}; B := Bu{d,C 2 } 

12 if | C*x ] < | C2 ] then sma// := 1; frig := 2 else smaW := 2; frig := 1 

13 U :=UU{(C smalh b)\S Csmall , b ^$AbeE} 

14 W := W U { (C btg , b) I 5 Cbig , b + A (C, b) E U } 

15 M:=W-({C}xE) 

16 Q' := B; 8 := 0; g' := Block(q); F' := 

17 for B G B do 

18 g := any .element -of(-B) 

19 if g G F then F' := F' U {B} 

20 for a G S such that %, a) / _L do 5' := 5' U { (5, a, Block(5(q, a))) } 

21 return (Q', £,($', 



Figure 1: Abstract PT-DFA minimization algorithm 



every q £ Q, and g — ai<i2 • • • a n a n +i— ► q' holds if and only if there is some q" G Q such 
that g — 0102 • • • a n — > g" and <5(g", On+i) = 7^ -L. The language accepted by P is the set 
of words labelling the paths from the initial state to final states, that is, C(T>) = { a G 
T,* \3q £ F : q — a— > q }. We will also talk about the languages of individual states, that is, 
C(q) = {a G £* |V G F : q g ' }. Obviously C(V) = £(q). 

We say that a state is relevant, if and only if either it is the initial state, or it is 
reachable from the initial state and some final state is reachable from it. More precisely, 
R = {q} U {q G Q \ 3q' G F : 3a G H* : Bp G H* : q -a—> q -p— > g' }. It is obvious that 
irrelevant states and their adjacent transitions may be removed from a PT-DFA without 
affecting its language. The initial state cannot be removed, because otherwise the result 
would violate the condition q G Q in the definition of a DFA. The removal yields the 
PT-DFA (R, E, 5', q, F'), where 5' = 5 n (R x S x R) and F' = F n i2. 

If no final state is reachable from the initial state, then C(T>) = 0. This is handled as a 
special case in our algorithm, because otherwise the result might contain unnecessary transi- 
tions from the initial state to itself. For this purpose, let empty _DFA(S) be ({x}, E, 0, x, 0), 
where x is just any element. Obviously empty _DFA(S) is the smallest PT-DFA with the 
alphabet X that accepts the empty language. 

The abstract minimization algorithm is shown in Figure [TJ In it, B denotes a partition 
on Q. That is, B is a collection {B\, B2, ■ ■ ■ , B n } of nonempty subsets of Q such that 
Bi U B2 U • • • U B n = Q, and Bi n Bj = whenever 1 < i < j < n. The elements of B are 
called blocks. By checking all statements that modify the contents of B, it is easy to verify 
that after its initialization on line 4, B is a partition on Q throughout the execution of the 
algorithm, except temporarily in the middle of line 11. 



648 



A. VALMARI AND P. LEHTINEN 



By Block(q) we denote the block to which state q belongs. Therefore, if q G Q, then 
q G Block (q) G B. For convenience, we define Block {A.) = _L ^ B. If Block(qi) 7^ Block{q2) 
ever starts to hold, then it stays valid up to the end of the execution of the algorithm. 

Elements of B x S are called splitters. Let 5b,o, = {{.QiO,,q') G 5 \ q' G B}. We say 
that splitter (B, a) is nonempty, if and only if <5b !(I 7^ 0. The set W contains those nonempty 
splitters that are currently "unprocessed". It is obvious from line 8 that empty splitters 
would have no effect. The main loop of the algorithm (lines 6. . . 15) starts with all nonempty 
splitters as unprocessed, and ends when no nonempty splitter is unprocessed. The classic 
algorithm uses either only F or only Q — F for constructing the initial splitters, but this 
does not work with a partial 5. 

The goal of the main loop is to split blocks until they are consistent with S, without 
splitting too much. We will now prove in two steps that this is achieved. 

Lemma 2.1. For every q\ G Q and qi G Q, if Block{q\) 7^ Block{q2) at any time of the 
execution of the algorithm in Figure^ then C{q\) 7^ £(92)- 

Proof. If the algorithm puts states q\ and qi into different blocks on line 4, then either 
e G C(q\) Ae £ £((72) or e ^ £(<7i) AeG £(#2)- Otherwise, it does so on line 11. Then there 
are i, j, B and a such that {i, j} = {1,2}, 5(qi,a) G B and S(qj,a) ^ B. Let q[ = S(qi,a). 

If 5(qj,a) / _L, then let q'j = 5(qj,a). We have q'j £ B. Because the algorithm has 
already put q\ and q'j into different blocks (they were in different blocks on line 9), there is 
some a G S* such that either a G £(<Z0 A a £ C(q'j) or vice versa. As a consequence, aa is 
in C(q\) or in £((72)1 but not in both. 

Assume now that 5(qj,a) = _L. Because of lines 1 and 2, C(q) ^ for every q G Q. 
There is thus some a G S* such that a G £(<J0- We have aa G £(%)• Clearly aa £ C(qj). ■ 

At this point it is worth noticing that line 1 is important for the correctness of the 
algorithm. Without it, there could be two reachable states q\ and q2 that accept the same 
language, and a such that S(qi,a) = X while 5(q2,a) is a state that accepts the empty 
language. The algorithm would eventually put q\ and qi into different blocks. 

We have shown that the main loop does not split blocks when it should not. We now 
prove that it splits all the blocks that it should. 

Lemma 2.2. At the end of the algorithm in Figure^ for every qi G Q, qi G Q and a G S, 
if Block (qi) = Block{q2), then Block (6 (qi, a)) = Block (<5 '(92 > o))- 

Proof. To improve readability, let B\ = Block (5 (q\, a)) and B<i = Block (5 \q2, a)). In the 
proof, Block 0, B\ and B2 are always evaluated with the current B, so their contents change. 
The proof is based on the following loop invariant: 

On line 6, for every q\ G Q, q2 G Q and a G E, if Block(qi) = Block(q2), 
then Bi = B 2 or (Bi,a) G U or (B 2 ,a) G U. 

Consider the situation immediately after line 5. If B\ 7^ _L, then (Bi,a) G IA. If B2 7^ J_, 
then (B2,a) G U. If B\ = B2 = -L, then B\ = B2- Thus the invariant holds initially. 

Consider any qi, q2, a and instance of executing line 6 such that the invariant holds. 
Our task is to show that the invariant holds for them also when line 6 is executed for the 
next time. 

The case that the invariant holds because Block{q\) 7^ Block{q2) is simple. Blocks are 
never merged, so Block (qi) 7^ Block{q2) is valid also the next time. 

Consider the case Block (qi) = Block (q2), B\ 7^ B2 and (B{,a) G U, where i = 1 or 
i = 2. Let j = 3 — i. If (Bi,a) is the (B,a) of line 7, then, when Block(qi) is the C of the 
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for-loop, qi goes to C\ and qj goes to C2. So Block (q{) = Block (q2) ceases to hold, rescuing 
the invariant. If (Bi, a) is not the (B, a) of line 7, then, whenever Bi is split, lines 13 and 14 
take care that both halves end up in IA. Thus (Bi, a) G IA stays true keeping the invariant 
valid, although Bi and IA may change. 

Let now Block (qi) = Block (52 ) and B\ = B2. To invalidate the invariant, B\ or B2 
must be changed so that B\ = B2 ceases to hold. When this happens, line 13 puts (Bi,a) 
into U, where i = 1 or i = 2. Like above, lines 13 and 14 keep (Bi, a) in IA although Bi may 
change until line 6 is entered again. 

We have completed the proof that the invariant stays valid. 

When line 16 is entered, U = 0. The invariant now yields that if Block(q\) = Block(q2), 
then Block (6 (qi, a)) = Block (5 (q^, a)). m 

It is not difficult to check that lines 16. . . 20 yield a PT-DFA, that is, Q' and £ are 
finite sets and so on. In particular, the construction gives 5'(B,a) a value at most once. 
We now show that the result is the right PT-DFA. 

Lemma 2.3. Let T>' = (Q 1 ', S,<5' ,q' ,F') be the result of the algorithm in FigureUi We have 
£(T>') = C(T>). Furthermore, every PT-DFA that accepts C(T>) has at least as many states 
and transitions as T>' . If it has the same number of states, it is either isomorphic with T>' 
(ignoring £ in the comparison), or it is of the form ({<?"}, 5" , q" , 0) with 5" ^ 0. 

Proof. The case where the algorithm exits on line 2 is trivial and has been discussed, so 
from now on we discuss the case where the algorithm goes through the main part. 

Let q G Q and a G E. Lemma [2.21 implies that Block(5(q, a)) = Block (5 (q' , a)) for every 
q' G Block (q). From this line 20 yields 5' (Block (q), a) = Block (5 (q, a)). By induction, if a G 
£*, q' G Q and q —a^ q' in T> then Block(q) — a— > Block(q') in V, and if Block(q) —o^> B 7^ 
_L in T>' then there is q' G Q such that B = Block(q') and q—a^q' in D. Similarly, 
lines 4 and 19 guarantee that q' G F if and only if Block(q') G F' . Together these yield 
C(q) = C(Block(q)) and, in particular, C(V) = £(q) = C(q') = C(V). 

Let (Q" ,Y>,8" ,q" ,F") be any PT-DFA that accepts the same language as T>' . Let 
q' G Q'. Because the algorithm executed the main part, there are some a G S* and p G S* 
such that q' —a^-q' and p G C(q'). So 079 G £(<f) = C(q"), and also Q" contains a state 
q" such that q" —a^q" and C(q") = C(q'). As <r may vary, there may be many q" with 
£(</') = C(q'). We arbitrarily choose one of them and denote it with f(q'). Lemma I2TT1 
implies that if q[ ± q> 2 , then C(q[) + C(q> 2 ), yielding f(q' 1 ) + f(q> 2 ). So \Q"\ > \Q'\. If 
5'(q',a) ^ _L, then some ap' G C(q') = C(f(q')), so 5" (f(q'),a) 7^ _L. As a consequence, 

> |<5'|. 

If |Q"| = then / is an isomorphism. ■ 

The proof has the consequence that after the end of the main loop, Block(q\) = 
Block (^2) if and only if C(q\) = C(q2)- 

Let us consider the number of times a transition (q, a, q') can be used on line 9. It is 
used whenever such a (B, a) is taken from U that q' G B, that is, Block(q') = B. So, shortly 
before using (q,a,q'), (Block(q'),a) G U held but ceased to hold (line 7). To use it again, 
(Block(q'),a) G U must be made to hold again. To make (Block(q'),a) G U to hold again, 
line 13 or 14 must be executed such that Block(q') is in the role of C sma u or Cbi g , and a is in 
the role of b. But line 14 tests that (C, b) G hi, so it cannot make (Block(q'),a) G IA to hold 
if it did not hold already on line 9, although it can keep (Block(q'),a) G IA valid. So only 
line 13 can make (Block(q'),a) G IA to hold again. An important detail of the algorithm 
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is that line 13 puts the smaller half of C (paired with a) into U. Therefore, each time 
{Block{q'),a) £ U starts to hold again, q' resides in a block whose size is at most half of 
the size in the previous time. As a consequence, (g, a, q') can be used for splitting at most 
lg \Q\ + 1 times. 

3. Refinable Partitions 

The refinable partition data structure maintains a partition of the set {1, . . . , max}. 
Our algorithm uses one instance of it with max = \Q\ for the blocks and another with 
max = \S\ for the splitters. Each set in the partition has an index in the range 1, . . . , sets, 
where sets is the current number of sets. The structure supports the following operations. 

Size(s): Returns the number of elements in the set with index s. 

Set(e): Returns the index of the set that element e belongs to. 

First(s) and Next(e): The elements of the set s can be scanned by executing first 
e := First(s) and then while e / do e := Next(e). Each element will be returned 
exactly once, but the ordering in which they are returned is unspecified. While 
scanning a set, Mark and Split must not be executed. 

Mark(e): Marks the element e for splitting of a set. 

Split(s): If either none or all elements of set s have been marked, returns 0. Otherwise 
removes the marked elements from the set, makes a new set of the marked elements, 
and returns its index. In both cases, unmarks all the elements in the set or sets. 

No-marks(s): Returns True if and only if none of the elements of s is marked. 
The implementation uses the following max-element arrays. 

elems: Contains 1, . . . , max in such an order that elements that belong to the same 

set are next to each other. 
loc: Tells the location of each element in elems, that is, e/ems[Zoc[e]] = e. 
sidx: The index of the set that e belongs to is sidx[e\. 

first and end: The elements of set s are elems[f], elems[f + 1], . . . , elems[£], where 

/ = first [s] and i = end [s] — 1 . 
mid: Let / and I be as above, and let m = mid[s]. The marked elements are elems[f], 
. . . , elems [m — 1] , and the unmarked are elems [m] , . . . , elems [£] . 
Initially sets = 1, first[l] = mid[l] = 1, encf[l] = max + 1, and elems[e] = loc[e] = e and 
sidx[e] = 1 for e € {1, . . . , max}. Initialization takes O(max) time and O(l) additional 
memory. 

The implementation of the operations is shown in Figure El Each operation runs in 
constant time, except Split, whose worst-case time consumption is linear in the number M 
of marked elements. However, also Split can be treated as constant-time in the analysis of 
our algorithm, because it is amortized constant time. When calling Split, there had been 
M calls of Mark. They are unique to this call of Split, because Split unmarks the elements 
in question. The total time consumption of these calls of Mark and Split is O(M), but the 
same result is obtained even if Split is treated as constant-time. 

4. Block-splitting Stage 

In this section we show how lines 4. . . 15 of the abstract algorithm can be implemented 
in 0(\S\ lg |Q|) time and 0(|<5|) memory assuming that 5 is available in a suitable ordering. 
The implementation of abstract lines 1. . . 3 and 16. . . 21 in 0(|Q| + \S\) time and memory 
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Size(s) 

return end[s] — first[s] 
Set(e) 

return sidx[e] 
First (s) 

return elems[first[s]] 
Next(e) 

if loc[e] + 1 > end [sidx[e]] then return 
else return elems[loc[e] + 1] 

Mark(e) 

s := sidx[e\; £ := Zoc[e]; m := mid[s] 
if I > m then 

elems\l\ := elems[m\; loc[elems[£]] := I 

elems[m] := e; loc[e] := m; mid[s] := m + 1 

Split(s) 

if rraeZ[s] = end[s] then mi<f[s] :=_/irsi[s] 
if mid[s] = first[s] then return 
else 

sets := sets + 1 

first[sets] := first[s]; mid[sets] := first[s]; end[sets] := mid[s] 
first[s] := mid[s] 

for £ := first[sets] to enc?[se£s] — 1 do sidx[elems[£]] := sets 
return sets 

No-marks(s) 

if mid[s] = first[s] then return True 
else return False 

Figure 2: Implementation of the refinable partition data structure 



is easy and not discussed further in this paper. (By "abstract lines" we refer to lines in 
Figured]). 

The implementation relies on the following data structures. The "simple sets" among 
them are all initially empty. They have only three operations, all O(l) time: the set is empty 
if and only if Empty returns True, Add(i) adds number i to the set without checking if it 
already is there, and Remove removes any number from the set and returns the removed 
number. The implementation may choose freely the element that Remove removes and 
returns. One possible efficient implementation of a simple set consists of an array that is 
used as a stack. 

tail, label and head: The transitions have the indices 1, . . . , |<5|. If £ is the index of 
the transition (q,a,q'), then tail[t] = q, label[t] = a, and head[t] = q'. 

Iri-trs: This stores the indices of the input transitions of state q. The ordering of the 
transitions does not matter. This is easy to implement efficiently. For instance, one 
may use an array elems of size \S\, together with arrays first and end of size \Q\, so 
that the indices of the input transitions of q are elems[first[q}], elems[first[q] + 1], 
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. . . , elems[end[q] — 1]. The array can be initialized in 0(|Q| + 1<5|) time with counting 
sort, using head[t] as the key. 
BRP: This is a refinable partition data structure on {1, . . . , \Q\}. It represents B, 
that is, the blocks. The index of the set in BRP is used as the index of the block 
also elsewhere in the algorithm. Initially BRP consists of one set that contains the 
indices of the states. 

TRP: This is a refinable partition data structure on {1, . . . , \5\}. Each of the sets in 
it consists of the indices of the input transitions of some nonempty splitter (B,a). 
That is, TRP stores {S B , a \B eB/\a£ ^0}. The index of 5 B , a in TRP is 

used as the index of {B, a) also elsewhere in the algorithm. For this reason, we will 
occasionally use the word "splitter" also of the sets in TRP. Initially TRP consists 
of { 8q a I a € E A 5q a 7^ }, that is, two transitions are in the same set if and only 
if they have the same label. This can be established as follows: 

for a € E such that 8q a ^ do 
for t £ 5 Q , a do TRP.Mark{t) 
TRP.Spliiil) 

If transitions are pre-sorted such that transitions with the same label are next to 
each other, then this runs in 0(|<5|) time and O(l) additional memory. 

Unready Spls: This is a simple set of numbers in the range 1, . . . , \5\. It stores the 
indices of the unprocessed nonempty splitters. That is, it implements the U of 
the abstract algorithm. Because each nonempty splitter has at least one incoming 
transition and splitters do not share transitions, |<5| suffices for the range. 

Touched -Blocks: This is a simple set of numbers in the range 1, . . . , \Q\. It contains 
the indices of the blocks C that were met when backwards-traversing the incoming 
transitions of the current splitter on abstract line 8. It is always empty on line 19. 

Touched Spls: This is a simple set of numbers in the range 1, . . . , \S\. It contains the 
indices of the splitters that were affected when scanning the incoming transitions of 
the smaller of the new blocks that resulted from a split. It is empty on line 4. 
The block-splitting stage is shown in Figure [3l We explain its operation in the proof of 
the following theorem. 

Theorem 4.1. Given a PT-DFA all whose states are relevant and that has at least one final 
state, the algorithm in Figured computes the same B (represented by BRP) as lines 4- ■ ■ 15 
of Figure [TJ 

Proof. Let us first investigate the operation of Split _block. As was told earlier, BRP models 
B, TRP models the set of all nonempty splitters (or the sets of their input transitions), and 
Unready Spls models U. The task of Split Mock is to update these three variables according 
to the splitting of a block C. Before calling Split Mock, the states q that should go to one 
of the halves have been marked by calling BRP .Mark(q) for each of them. 

Line 1 unmarks all states of C and either splits C in BRP updating B, or detects that 
one of the halves would be empty, so C should not be split. In the latter case, line 2 exits 
the procedure. The total effect of the call and its preceding calls of BRP .Mark is zero 
(except that the ordering of the states in BRP may have changed) . 

From now on assume that both halves of C are nonempty. Line 3 makes b the index 
of the bigger half B and b' the index of the smaller half B' . Because C is no more a block, 
for each a € E, the pairs (C, a) are no more splitters, and must be replaced by (B,a) and 
(B',a), to the extent that they are nonempty. For this purpose, lines 4, 5 and 10 scan 
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Split .block (b) 

1 V := BRP. Split (b) 

2 if V + then 



3 if BRP.Size{b) < BRP.Size(b') then b' := b 

4 q := BRP.First(b') 

5 while g / do 

6 for t G In.trs[q] do 

7 p := TRP.Set(t) 

8 if TRP. No. marks (p) then Touched. Spls. Add (p) 

9 TRP.Mark(t) 

10 g := BRP. Next (q) 

11 while -i Touched. Spls. Empty do 

12 p := Touched. Spls. Remove 

13 p' := TRP.Split{p) 

14 if p' / then Unready .Spls .Add (p 1 ) 



Main.part 

15 Initialize TPP to { <5q )CI | a G S A <5q j0 7^ } 

16 for p := 1 to TRP.sets do Unready. Spls. Add (p) 

17 for g G F do BRP. M ark (q) 

18 Split. block (1) 

19 while ^Unready. Spls. Empty do 



20 p := Unready. Spls. Remove 

21 i := TRP. First (p) 

22 while i^Odo 

23 g := tat/[t]; 6' := BRP.Set(q) 

24 if BRP.No.marks(b') then Touched. Blocks. Add (b') 

25 BRP.Mark(q) 

26 i := TRP.Next{t) 

27 while ^Touched. Blocks. Empty do 

28 5 := Touched. Blocks. Remove 

29 Split. block (b) 



Figure 3: Implementation of lines 4. . . 15 of the abstract algorithm 

S' and line 6 scans the incoming transitions of the currently scanned state of B'. Line 9 
marks, for each a <G S, the transitions that correspond to (B', a). Line 7 finds the index of 
(C, a) in TiJP, and line 8 adds it to Touched.Spls, unless it is there already. After all input 
transitions of B' have been scanned, lines 11 and 12 discharge the set of affected splitters 
(C, a). Line 13 updates (C,a) to those of (B,a) and (B',a) that are nonempty. 

Line 14 corresponds to the updating of U. If both (B,a) and (B',a) are nonempty 
splitters, then the index of (B',a) is added to Unready .Spls , that is, (B',a) is added to U. 
In this case, (B, a) inherits the index of (C, a) and thus also the presence or absence in U. 
If (B, a) is empty, then (£?', a) inherits the index and W-status of (C, a). If (B', a) is empty, 
then (C, a) does not enter Touched.Spls in the first place. To summarize, if (C, a) G U, 
then all of its nonempty heirs enter U; otherwise only the smaller heir enters U, and only 
if it is nonempty. This is equivalent to abstract lines 13. . . 14. Regarding abstract line 15, 
(C, a) disappears automatically from U because its index is re-used. 
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Lines 15. . . 18 implement the total effect of abstract lines 4. . . 5. The initial value of 
BRP corresponds to B = {Q}- Line 15 makes TRP contain the sets of input transitions 
of all nonempty splitters (Q,a) (where a £ E), and line 16 puts them all to U. If Q = F, 
then lines 17 and 18 have no effect. Otherwise, they update B to {F, Q — F}, update TRP 
accordingly, and update Unready Spls to contain all current nonempty splitters. 

Lines 19 and 20 match trivially abstract lines 6 and 7. They choose some nonempty 
splitter (B,a) for processing. Lines 21. . .26 can be thought of as being executed between 
abstract lines 7 and 8. They mark the states in C\ for every C that is scanned by abstract 
line 8, and collect the indices of those C into Touched -Blocks . Lines 27 and 28 correspond 
to abstract line 8, and abstract lines 9. . . 15 are implemented by the call Split -block (b). 
Lines 1 and 2 have the same effect as abstract lines 9. . . 11. Line 3 implements abstract 
line 12. The description of line 14 presented above matches abstract lines 13. . . 15. ■ 

Theorem 4.2. Given a PT-DFA all whose states are relevant and that has at least one 
final state, and assuming that the transitions that have the same label are given successively 
in the input, the algorithm in Figure^ runs in 0{\5\ lg \Q\) time and 0{\5\) memory. 

Proof. The data structures have been listed in this section and they all consume 0(|Q|) or 
0(|<5|) memory. Their initialization takes + \b~\) time. Because all states are relevant, 

we have \Q\ < \5\ + 1, so 0(|Q|) terms are also 0(\5\). 

We have already seen that each individual operation in the algorithm runs in amortized 
constant time, except for line 15, which takes 0(|5|) time. We also saw towards the end 
of Section [2] that each transition is used at most lg | Q | + 1 times on line 9 of the abstract 
algorithm. This implies that line 25, and thus lines 23. . . 26, are executed at most |5|(lg \ Q\ + 
1) times. The same holds for lines 28 and 29, because the number of ^4rfrf-operations on 
Touched -Blocks is obviously the same as itemcwe-operations. Because TRP-sets are never 
empty, lines 20 and 21 are not executed more often than line 25, and lines 22 and 27 are 
executed at most twice as many times as line 25. Line 19 is executed once more than line 20, 
and lines 15. . . 18 are executed once. Line 16 runs in 0(|5|) and line 17 in 0(|Q|) time. 

Lines 1. . . 4 are executed at most once more than line 29. If BRP .Size(b) > BRP .Size(b') 
on line 3, then each of the states scanned by lines 5 and 10 was marked on line 17 or 25. 
Otherwise the number of scanned states is smaller than the number of marked states. There- 
fore, line 10 is executed at most as many times as lines 17 and 25, and line 5 at most twice 
as many times. Whenever lines 7. . . 9 are executed anew (or for the first time) for some 
transition, the end state of the transition belongs to a block whose size is at most half of 
the size in the previous time (or originally), because the block was split on line 1 and the 
smaller half was chosen on line 3. Therefore, lines 7. . . 9 are executed at most |5|lg|Q| 
times. Line 6 is executed as many times as lines 7 and 10 together. The executions of 
lines 12. . . 14 are determined by line 8, and of line 11 by lines 4 and 8. ■ 

5. Sorting Transitions 

In TRP, transitions are sorted such that those with the same label are next to each 
other. Transitions are not necessarily in such an order in the input. Therefore, we must 
take the resources needed for sorting into account in our analysis. 

Transitions can of course be sorted according to their labels with heapsort in 0(\5\ lg \S\) 
time and 0(|<5|) memory. This is inferior to the time consumption of the rest of the algo- 
rithm. Because the labels need not be in alphabetical order, a suitable ordering can also 
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TRP.sets := 
for t G 5 do 

a := label[t]; i := icte[a] 

if i < 1 V i > TRP.sets V TRP.mid[i] / a then 
i := TRP.sets + 1; TRP.sets := i 
idx[a] := i; TRP .mid[i] := a; TRP.end[i] := 1 
else TRP.end[i\ := TRP.end[i] + 1 
TRP.first[l] := 1; TPP.er«f[l] := TPPW[1] + 1; TRP.mid[l] := TRP.end[l] 
for i : = 2 to TRP.sets do 

TRP.first[i] := TRP.end[i-l] 

TRP.end[i\ := TRP.first[i] + TRP.end[i\; TRP.mid[i\ := TRP.end[i\ 
for i E <5 do 

i := idi[/o6eZ[t]]; ^ := TRP.mid[i\ - 1; TRP.mid[i\ := £ 
Ti?P.e/ems[£] := t; TRP.loc[t] := £; TRP.sidx[t] := j 

Figure 4: Initialization of Ti?P in 0(|5|) time and 0(|S|) additional memory 

be found by putting the transitions into a hash table using their labels as the keys. Then 
nonempty hash lists are sorted and concatenated. This takes 0(|<5|) time on the average, 
and 0(|5|) memory. However, the worst-case time consumption is still 0(\5\ lg \5\). 

A third possibility runs in 0(|<5|) time even in the worst case, but it uses 0(|S|) ad- 
ditional memory. That its time consumption may be smaller than memory consumption 
arises from the fact that it uses an array idx of size |S| that need not be initialized at all, 
not even to all zeros. It is based on counting the occurrences of each label as in exercise 
2.12 of p], and then continuing like counting sort. The pseudocode is in Figure HI 

6. Measurements and Conclusions 

Table [1] shows some measurements made with our test implementations of Knuutila's 
and our algorithm. They were written in C++ and executed on a PC with Linux and 
1 gigabyte of memory. No attempt was made to optimise either implementation to the 
extreme. The implementation of Knuutila's algorithm completes the transition function to 
a full function with a well-known construction. Namely, it adds a "sink" state to which all 
originally absent transitions and all transitions starting from itself are directed. 

The input DFAs were generated at random. Because of the difficulty of generating a 
precise number of transitions according to the uniform distribution, sometimes the generated 
number of transitions was slightly smaller than the desired number. Furthermore, the DFAs 
may have unreachable states and/or reachable irrelevant states that are processed separately 
by one or both of the algorithms. Running time depends also on the size of the minimized 
DFA: the smaller the result, the less splitting of blocks. We know that the joint effects of 
these phenomena were small, because, in all cases, the numbers of states and transitions of 
the minimized DFAs were > 99.4% of \Q\ and \5\ in the table. Therefore, instead of trying 
to avoid the imperfections by fine-tuning the input (which would be difficult), we always 
used the first input DFA that our generator gave for the given parameters. 

The times given are the fastest and slowest of three measurements, made with |P| = 
^+d, where d E {—1, 0, 1}. They are given in seconds. The number of transitions \8\ varies 
between 10% and 100% of |Q||E|. The times contain the special processing of unreachable 
and irrelevant states, but they do not contain the reading of the input DFA from and writing 
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Table 1: Running time measurements. |<5| = p|Q||£|, where p is given as %. 
A: \Q\ = 1000 and |S| = 100. B: \Q\ = 1 000 and |S| = 1 000. 
C: \Q\ = 10 000 and |E| = 100. D: \Q\ = 10 000 and |E| = 1000. 

alg. 10% 30% 50% 70% 90% 100% 



A 


our 
Knu 


0.004 0.005 
0.026 0.026 


0.013 0.014 
0.034 0.035 


0.024 0.025 
0.040 0.041 


0.036 0.037 
0.045 0.046 


0.052 0.060 
0.048 0.049 


0.061 0.062 
0.053 0.054 


B 


our 
Knu 


0.059 0.061 
0.467 0.486 


0.277 0.279 
0.645 0.651 


0.549 0.551 
0.785 0.795 


0.855 0.865 
0.893 0.907 


1.181 1.211 
0.971 0.979 


1.330 1.416 
1.033 1.040 


C 


our 
Knu 


0.070 0.071 
0.526 0.529 


0.296 0.301 
0.730 0.734 


0.574 0.581 
0.901 0.904 


0.887 0.893 
1.027 1.035 


1.210 1.229 
1.128 1.130 


1.424 1.434 
1.200 1.202 


D 


our 
Knu 


1.224 1.238 
6.324 6.356 


4.038 4.087 
8.606 8.705 


7.132 7.164 
10.46 10.64 


10.50 10.57 
11.91 11.95 


14.18 14.34 
13.00 13.04 


16.41 16.48 
13.83 13.89 



the result to a file. With \Q\ = |S| = 10 000, Knuutila's algorithm ran out of memory, while 
our algorithm spent about 15 s when p = iqmU = 10 % and 32 s when p = 20 %. 

The superiority of our algorithm when p is small is clear. That our algorithm loses when 
p is big may be because it uses both F and Q — F in the initial splitters, whereas Knuutila's 
algorithm uses only one of them. Also Knuutila's algorithm speeds up as p becomes smaller. 
Perhaps the reason is that when p is, say, 10 %, the block that contains the sink state has 
an unproportioned number of input transitions, causing blocks to split to a small and big 
half roughly in the ratio of 10% to 90%. Thus small blocks are introduced quickly. As a 
consequence, the average size of the splitters that the algorithm uses during the execution is 
smaller than when p = 100%. The same phenomenon also affects indirectly our algorithm, 
probably explaining why its running time is not linear in p. 

Of the three notions of "smaller" mentioned in the introduction, our analysis does not 
apply to the other two. It seems that they would require making Split-block somewhat 
more complicated. This is a possible but probably unimportant topic for further work. 

A near-future goal of us is to publish a much more complicated, true 0(m lg n) algorithm 
for the problem in [2], that is, the multi-relational coarsest partition problem. 
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